Introduction

For our exploratory analysis, we used datasets from the Center for Disease Control and Prevention and a dataset collected by the Bureau of Labor Statistics (BLS) Local Area Unemployment Statistics (LAUS) program. We decided to focus on the relationship between economic status versus cancer deaths per case and cases per population. We also focus our analysis in the West Coast of the United States: Washington, Oregon, and California.

Distribution of Variables

Age-Adjusted Rates of New Cases vs Sex in 2017

The Box plot below shows the age adjusted rate of new cancer cases based on sex/gender of a person. The box plot takes into account the data of all three of the west coast states that our group will be analyzing. At first glance, it is obvious that there seems to be a lot more cases within the male population than the female population. The q1 difference between male and female is about 53 cases per 100,000 people. The q3 difference is about 77 cases per 100,000 people. The body of the box plot obviously shows a huge disparity. In every aspect, at least in the west coast, it seems that males are more likely to be diagnosed than women.

sex_dist

Age-Adjusted Rates of New Cases vs Race in 2017

This second box plot below shows the age adjusted rates of new cases between different races. The most obscure part of this graph is the fact that the body of the plot for Native Americans seem to be the largest of any group. Though it does not seem to effect the ‘all races’ plot too much, leading one to think that the sample size seem to be smaller than the rest. The ’all races plot seem to correlate the most with African American and white Americans the most. This makes sense as those are the majority groups within the US. AAPI seems to have the lowest average and lowest minimum as well. Hispanics seem to hold thier own between the two majority groups and the AAPI group.

race_dist

Relationships between Variables

Age-Adjusted Death Rates VS Median Household Income

For the bubble plot below, we compared the median household income in 2019 in Washington, Oregon, and California to the age adjusted death rates to understand the relationship that economic status has on the mortality rate from cancer. We used the gapminder and plotly packages to color code the states and visualize this relationship. From the visualization, we noticed higher income counties in California had a lower mortality rate than lower income counties. This may be due to the availability of better hospital’s with cancer treatments within higher income counties.

Age-Adjusted Rate of New Cancer VS Counties

The map below displays the age-adjusted rate of new cancers. We used the map functionality within ggplot to retrieve the longitudes and longitudes for the three states we were focusing on, and we merged the geographical dataset and the original comprehensive dataset to generate a choropleth map that displays lighter colors for a greater age-adjusted case rate.